Character-level Convolutional Network for Text Classification Applied to Chinese Corpus
نویسندگان
چکیده
Compared with word-level and sentence-level convolutional neural networks (ConvNets), the character-level ConvNets has a better applicability for misspellings and typos input. Due to this, recent researches for text classification mainly focus on character-level ConvNets. However, while the majority of these researches employ English corpus for the character-level text classification, few researches have been done using Chinese corpus. This research hopes to bridge this gap, exploring character-level ConvNets for Chinese corpus test classification. We have constructed a large-scale Chinese dataset, and the result shows that character-level ConvNets works better on Chinese character dataset than its corresponding pinyin format dataset, which is the general solution in previous researches. This is the first time that character-level ConvNets has been applied to Chinese character dataset for text classification problem.
منابع مشابه
Chinese Medical Question Answer Matching Using End-to-End Character-Level Multi-Scale CNNs
This paper focuses mainly on the problem of Chinese medical question answer matching, which is arguably more challenging than open-domain question answer matching in English due to the combination of its domain-restricted nature and the language-specific features of Chinese. We present an end-to-end character-level multi-scale convolutional neural framework in which character embeddings instead...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملVery Deep Convolutional Networks for Text Classification
The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which are very successful in computer vision. We present a new architecture for text processing which operates directly on the character level and uses only small convoluti...
متن کاملCharacter-level Convolutional Networks for Text Classification
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several largescale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and de...
متن کاملWhich Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?
This article offers an empirical study on the different ways of encoding Chinese, Japanese, Korean (CJK) and English languages for text classification. Different encoding levels are studied, including UTF-8 bytes, characters, words, romanized characters and romanized words. For all encoding levels, whenever applicable, we provide comparisons with linear models, fastText (Joulin et al., 2016) an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1611.04358 شماره
صفحات -
تاریخ انتشار 2016